Spanish/English Cross-Lingual Categorization
نویسنده
چکیده
This article deals with the problem of Cross-Lingual Text Categorization (CLTC), which arises when documents in different languages must be classified according to the same classification tree. We describe practical and cost-effective solutions for automatic Cross-Lingual Text Categorization, both in case a sufficient number of training examples is available for each new language and in the case that for some language no training examples are available. Experimental results of the bi-lingual classification of the ILO corpus (with documents in English and Spanish) are obtained using bi-lingual training, terminology translation and profile-based translation. The results are compared to the respective mono-lingual baselines for three different document representations (unlemmatized, lemmatized and normalized keywords).
منابع مشابه
Cross-Lingual Text Categorization
This article deals with the problem of Cross-Lingual Text Categorization (CLTC), which arises when documents in different languages must be classified according to the same classification tree. We describe practical and cost-effective solutions for automatic Cross-Lingual Text Categorization, both in case a sufficient number of training examples is available for each new language and in the cas...
متن کاملA Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets
Despite being one of the most popular tasks in lexical semantics, word similarity has often been limited to the English language. Other languages, even those that are widely spoken such as Spanish, do not have a reliable word similarity evaluation framework. We put forward robust methodologies for the extension of existing English datasets to other languages, both at monolingual and cross-lingu...
متن کاملSWAT: Cross-Lingual Lexical Substitution using Local Context Matching, Bilingual Dictionaries and Machine Translation
We present two systems that select the most appropriate Spanish substitutes for a marked word in an English test sentence. These systems were official entries to the SemEval-2010 Cross-Lingual Lexical Substitution task. The first system, SWAT-E, finds Spanish substitutions by first finding English substitutions in the English sentence and then translating these substitutions into Spanish using ...
متن کاملALTN: Word Alignment Features for Cross-lingual Textual Entailment
We present a supervised learning approach to cross-lingual textual entailment that explores statistical word alignment models to predict entailment relations between sentences written in different languages. Our approach is language independent, and was used to participate in the CLTE task (Task#8) organized within Semeval 2013 (Negri et al., 2013). The four runs submitted, one for each languag...
متن کاملMIRACLE's 2005 Approach to Cross-Lingual Question Answering
This paper presents the 2005 MIRACLE’s team approach to CLEF QA with Spanish as a target task using miraQA system. The system is based on answer extraction and uses mainly syntactic patterns and semantic information. Six runs were submitted for Spanish, English and Italian as source languages using commercial translation software. The system performs reasonably well for Spanish factual question...
متن کامل